You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
disable LiteLLM retry amplification for memory-compressor summarization calls
add a local extractive fallback summary when compressor LLM calls fail or time out
preserve recent scan messages while avoiding long-scan stalls caused by repeated summarization failures
Why
This addresses the reliability failure mode in #470 where long-context scans can get stuck when the memory compressor repeatedly hits LiteLLM timeouts. The fallback keeps the scan moving while retaining ordered message previews and recent operational context.
This PR fixes a reliability failure in long-context scans where repeated LiteLLM summarization timeouts could stall the memory compressor. It adds num_retries=0 to prevent retry amplification and introduces _build_fallback_summary, an extractive local summarizer that preserves head/tail message previews when the LLM call fails.
num_retries=0 is passed to litellm.completion so a single timeout does not silently trigger multiple retries.
_build_fallback_summary returns a context_summary message with up to 12 sampled previews (head + tail) when the LLM raises any exception; the existing exception handler now delegates to it instead of returning messages[0].
The empty-response branch (if not summary.strip(): return messages[0]) was not updated to use the new fallback, leaving one failure mode — an LLM that responds with blank content — still returning a raw uncompressed old message.
Confidence Score: 3/5
The change is almost complete but one branch was not updated consistently with the rest.
The empty-summary guard at line 178 still returns messages[0] directly, which is exactly the raw-old-message return the PR is designed to eliminate. An LLM that responds with an empty string triggers this path and re-creates the context inflation the fix targets. The rest of the change — disabling retries and the fallback builder — is correct and well-tested.
strix/llm/memory_compressor.py line 178 needs the same _build_fallback_summary treatment applied to the exception handler above it.
Important Files Changed
Filename
Overview
strix/llm/memory_compressor.py
Adds num_retries=0 to suppress LiteLLM retry amplification and introduces _build_fallback_summary as an extractive local fallback on exception; the empty-LLM-response branch at line 178 still returns messages[0] (a raw old message) instead of the fallback, leaving one failure mode unaddressed.
tests/llm/test_memory_compressor.py
New test file covering retry-disable, timeout fallback, and compress_history fallback path; the empty-response branch (not summary.strip()) is not exercised so the surviving messages[0] return goes undetected.
Comments Outside Diff (1)
strix/llm/memory_compressor.py, line 177-178 (link)
When the LLM returns an empty or whitespace-only response, the code still returns messages[0] — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/llm/memory_compressor.py
Line: 177-178
Comment:
When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---### Issue 1 of 1
strix/llm/memory_compressor.py:177-178
When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.
```suggestion if not summary.strip(): return _build_fallback_summary(messages)```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
This addresses the reliability failure mode in #470 where long-context scans can get stuck when the memory compressor repeatedly hits LiteLLM timeouts. The fallback keeps the scan moving while retaining ordered message previews and recent operational context.
Tests